Tip: This page is generated from a Jupyter notebook, some of the code are hid under the hood, some of them can be shown by clicking the button Show Code. If you want to visit the complete notebook, please click the view on github button above.

Introduction

should the scope would be to show the impact in global economy or how has affected Denmark in comparison with major countries how we define major?

Undoubtedly the recent appearrance and expansion of COVID-19 virus has affected the lives of billions of people worldwide is many aspects. Goverments have been under constant challenge to reduce social interaction in order to mitigate the possibilities of virus transmission. Therefore, they have introduced hard measurements to face this severe situation which have significant impact to every body's live.

Economy was the first area that affected from those measurements. The work culture had to change to meet the derivative of the goverments, which led companies to move faster towards digitilisation. As a result companies that weren't eager in such changes to face important financial issues forcing them in many cases to reduce their human resources. For other companies such travelling agencies or copmanies in hospitalitty sector, the hit was even harder since they rely their profits entirely on the people's need for entertainment, social exploration etc.. Therefore, they have completely or partially shut down their operation leading many people in unemployment.

The above constitutes common observations and may look discouranging and demotivating facts for many people. However, we can not conclude how big this impact is in each country's overall economy without an in depth investigation of actual facts.

Upon that, we came to the desicion to analyse data from microeconomic and macroeconomic point of view in order to get a more clear understanding of how the virus has affected our economy.

To sum up, from this study we aim to provide a clear conclusion about the economic consequences due to COVID-19 which will be based on analysis of reliable sources. Through interactive and annotated graphs we want to give to the intendent audience all the information needed in order to understand the impact of COVID-19 in economy in a simple and concine manner.

Data Analysis

In the study we will analyse data from all the countries directly affected from COVID-19 giving more focus though in Denmark. We will start the study by presenting a statistical analysis of how the situation with regards to COVID-19 looks like in the most major countries. Then we will include financial data to explore whether there is a significant impact of the virus in our economy and which countries specifically have affected the most. In order to carry out the analysis we will use data from IMF, OECD and other sources which can be found at the end of the page. The reason we chose those datasets was that we believe they contain all the information needed to obtain the required outcome about the fincanial impact of COVID-19.

COVID-19 analysis

In this section, we will dive more into COVID-19 data to present the current situation of virus by illustrating the the numbers of confirmed and death cases across major countries. Then with help of interactive represenation of those numbers we will try to understand the spread rate and distribution of COVID-19.

In the following table is shown a sample of the data regarding COVID-19. The dataset contains columns with the countries, confirmed and recovered cases as well as overall deaths per country.

Date Country Confirmed Recovered Deaths
19069 2020-05-02 West Bank and Gaza 353 76 2
19070 2020-05-02 Western Sahara 6 5 0
19071 2020-05-02 Yemen 10 1 2
19072 2020-05-02 Zambia 119 75 3
19073 2020-05-02 Zimbabwe 34 5 4

Exploration analysis

In this section we will perfrom a basic statistical analysis of the data in order to identify how the data are distibuted among the columns and to detect any important patterns that might be usefull in the further on analysis.

First, we will start by illustating the descriptive statistics of our dataset. In this way we can summarize the central tendency, dispersion and shape of our dataset's distribution.

In the table below it can be observed the great differences in the max values among the cases. The standard deviation is quite high in all the presented cases which means that our data is spread out. The 25th and 50th percentile for recovered and deaths cases is zero while the 75th percentile is 17 and 3, respectively. do you think that make sense

Confirmed Recovered Deaths
count 1.907400e+04 19074.000000 19074.000000
mean 4.216215e+03 1130.094841 273.549754
std 3.400320e+04 7936.359907 2343.467559
min 0.000000e+00 0.000000 0.000000
25% 0.000000e+00 0.000000 0.000000
50% 5.000000e+00 0.000000 0.000000
75% 2.417500e+02 19.000000 4.000000
max 1.132539e+06 175382.000000 66369.000000

In the three figures below is illustrated how the cases distributed across the countries (for sake of simplicity and space only the countrie with less than 1000 deaths are illustrated).

In the figures is illustrated the maximum values of the cases for the corresponding countries in order to identify which countries have recorded the highest numbers of confirmed, recovered and death incidents due to COVID-19. By narrowing down to top five countries we can see that France, Spain, USA, Italy and the United Kingdom have had the higher number of confirmed cases as well as deaths. While recorded recovered cases for the top five countries includes the USA, Italy, Spain, Germany and China. By looking at the deaths it is remarkable how many deads more have the top 5 affected countries from the rest.

# collapse-show

group = full_clean_data.groupby('Country')['Deaths','Confirmed','Recovered'].max().sort_values(by=['Deaths','Confirmed','Recovered'])
group = pd.DataFrame(group)
group = group.reset_index()
# keep only the countries with more than 10000 deaths
new_group = group.query("Deaths >= 1000")


#define colors
red = alt.value('#f54242')
green = alt.value('#137E2A')
black = alt.value('#050404')

#presenting the confirmed cases per country
bars = alt.Chart(new_group).mark_bar(size=5).encode(
    x='Confirmed:Q',
    y=alt.Y("Country:O", sort='-x'),color = red
)

text = bars.mark_text(
    align='left',
    baseline='middle',
    dx=3  # Nudges text to right so it doesn't appear on top of the bar
).encode(
    text='Confirmed:Q',color =black
)

bars2 = alt.Chart(new_group).mark_bar(size=5).encode(
    x='Recovered:Q',
    y=alt.Y("Country:O", sort='-x'),color=green
)


text2 = bars2.mark_text(
    align='left',
    baseline='middle',
    dx=3  # Nudges text to right so it doesn't appear on top of the bar
).encode(
    text='Recovered:Q',color=black
)

bars3 = alt.Chart(new_group).mark_bar(size=5).encode(
    x='Deaths:Q',
    y=alt.Y("Country:O", sort='-x'),color=black
)


text3 = bars3.mark_text(
    align='left',
    baseline='middle',
    dx=3  # Nudges text to right so it doesn't appear on top of the bar
).encode(
    text='Deaths:Q',color=black
)



laydermap = (bars + text).properties(width= 250,height=300)|(bars2+text2).properties(width= 250,height=300)|(bars3+text3).properties(width=250,height=300)
laydermap.configure_axis(grid=False).configure_view(strokeWidth=0)

Looking at the basic distribution of the data it is clearly observed across all countries we have decided to focus on few major countries in order to make our analysis more robust. Therefore in the rest of study we will put more focus on the following countries:

  • China: Because it is the place where the Covid-19 presented for the first time.
  • Denmark: This is the country where this study carried out.
  • USA, UK, Italy, Spain, France: These countries constitute the most affected ones by the pandemic.

Further data exploration and preparation

I was thinking to exclude Iran from the major countries and use only the rest

In order to extract more information as possible from the dataset it is necessary to combine several datasets. By doing so, we include columns referring to daily new cases, new deaths and new recovered cases. Other, than that an investigation for missing values and treatment of those it is also a requirement to bring the dataset in form ready for analysis. In the present study the missing values were filled with zeros. It considered the best way to treat such a values because if for example the missing values were filled with the mean, mode or median could lead to false interpration of the results.

In the following tables it is shown first a sample of the final dataset about COVID-19 after the preprossesing and secondly the descriptive stastics of the dataset.

# collapse-show
# data processing to create Active, New cases, New deaths, New recovered
full_clean_data['Active'] = full_clean_data['Confirmed'] - full_clean_data['Recovered'] - full_clean_data['Deaths']

countries = ['US', 'Italy', 'China', 'Spain', 'France', 'Iran', 'United Kingdom', 'Denmark']
selected_data = full_clean_data[full_clean_data['Country'].isin(countries)]

for i in selected_data.index:
    date = selected_data.loc[i, 'Date']
    country = selected_data.loc[i, 'Country']
    date = datetime.strptime(date, '%Y-%m-%d')
    yesterday = datetime.strftime(date - timedelta(1), '%Y-%m-%d')
    yesterdayData = selected_data.loc[(selected_data.Date == yesterday) & (selected_data.Country == country)]
    if len(yesterdayData) <= 0:
        selected_data.loc[i, 'New cases'] = 0
        selected_data.loc[i, 'New deaths'] = 0
        selected_data.loc[i, 'New recovered'] = 0
        continue
    yesterdayData = yesterdayData.iloc[0]
    selected_data.loc[i, 'New cases'] = selected_data.loc[i, 'Confirmed'] - yesterdayData.Confirmed
    selected_data.loc[i, 'New deaths'] = selected_data.loc[i, 'Deaths'] - yesterdayData.Deaths
    selected_data.loc[i, 'New recovered'] = selected_data.loc[i, 'Recovered'] - yesterdayData.Recovered

selected_data = selected_data.fillna(value=0)
selected_data['New cases'] = selected_data['New cases'].astype(int)
selected_data['New deaths'] = selected_data['New deaths'].astype(int)
selected_data['New recovered'] = selected_data['New recovered'].astype(int)
Date Country Confirmed Recovered Deaths Active New cases New deaths New recovered
18968 2020-05-02 Iran 96448 77350 6156 12942 802 65 1032
18972 2020-05-02 Italy 209328 79914 28710 100704 1900 474 1665
19044 2020-05-02 Spain 216582 117248 25100 74234 3147 557 5198
19060 2020-05-02 US 1132539 175382 66369 890788 29078 1426 11367
19064 2020-05-02 United Kingdom 183500 896 28205 154399 4815 622 4
Confirmed Recovered Deaths Active New cases New deaths New recovered
count 8.100000e+02 810.000000 810.000000 810.000000 810.000000 810.000000 810.000000
mean 6.571904e+04 16897.351852 5075.017284 43746.675309 2604.774074 223.482716 718.943210
std 1.471835e+05 29178.478531 9996.622673 121351.038650 6164.895180 443.875914 1818.560247
min 0.000000e+00 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
25% 1.100000e+01 1.000000 0.000000 7.000000 0.000000 0.000000 0.000000
50% 6.467000e+03 192.500000 225.500000 2362.000000 155.000000 8.000000 10.000000
75% 8.164350e+04 21802.000000 4623.250000 35503.250000 2551.250000 217.250000 874.000000
max 1.132539e+06 175382.000000 66369.000000 890788.000000 36188.000000 2612.000000 33227.000000

COVID-19 presentation in Denmark

check this part again. Do you think we should show results for denmark separately? I was thinking to show all the major countried together. Although in this case Denmark has very low number of case compare to other countries.

The figure below illustrates the total confirmed cases and deaths in Denmark from the day the virus appeared in the country (approximately January 22) until April 23. we can either update the date or just stop to the day the lockdown stopped

It seems that even during the lock down (10/03/2020 - 20/03/20) the number of confirmed cases and deaths showed an increasing trend. Although it has to be highlighted that the purpose of lock down was to keep these numbers as low as possible in order not to exceed the capacity of the cases that the health system can handle. maybe we can find data about that and include them to see if they get the target

# collapse-hide
denmark = selected_data[selected_data['Country'] == 'Denmark']

base = alt.Chart(denmark).mark_bar().encode(
    x='monthdate(Date):O',
).properties(
    width=250
)

red = alt.value('#f54242')
base.encode(y='Confirmed').properties(title='Total Confirmed') | base.encode(y='Deaths', color=red).properties(title='Total Deaths')|base.encode(y='New cases').properties(title='Daily New Cases') | base.encode(y='New deaths', color=red).properties(title='Daily New Deaths')

Moving forward to the figures, it is observed the number of new incidents(3rd pane from left) and deaths (4th pane from the left) in Denmark during the same period.

When the lock down implemented (around 10th of March) and until the 13th of the same month 170 new cases were recorded daily. Whereas on the 14th of March a significant drop of approximatelly 75% of the cases recorded is observed. Between, March the 24th and April the 9th the number of confirmed cases reached its peak with an average of 300 cases per day. Then the recorded cases began to drop again until today where they have reached of an average 150 cases per day(10/05/2020 - 23/05/2020).

The number of daily deaths reached its highest numbers between 3rd and 9th of April and dropped by approximatelly 50% after that. By today the number of deaths per day doesn't exceed the 9 deaths. Overall, we can see that the measurments against the virus yielded in reduction of deaths and confirmed cases after its implementation.

#collapse-hide
base = alt.Chart(selected_data).mark_bar().encode(
    x='monthdate(Date):O',
).properties(width=500)

base.encode(y='Confirmed',color='Country').properties(title = 'Total confirmed')|base.encode(y='Deaths',color = 'Country').properties(title='Total deaths')

#collapse-hide

base = alt.Chart(selected_data).mark_bar().encode(
    x='monthdate(Date):O',color="Country"
    
).properties(width=600,height=300)

base.encode(y=alt.Y("New cases:Q"))|base.encode(y=alt.Y("New deaths:Q"))

#collapse-hide
# base = alt.Chart(selected_data).transform_filter(
#     alt.datum.symbol != 'GOOG'
# ).mark_area().encode(
#     x='monthdate(Date):O',
#     color='Country:N',
#     row=alt.Row('Country:N', sort=['US', 'United Kingdom', 'Italy', 'Spain','France','Iran','China','Denmark'])
# ).properties(height=50, width=400)

# base.encode(y='New cases:Q')

# collapse-hide
#base.encode(y='New cases').properties(title='Daily New Cases') | base.encode(y='New deaths', color=red).properties(title='Daily New Deaths')

After the investigation onn new cases and deaths we would like to check how the death rate for each of the countries has been formed. The figure below show exactly this. By pointing on each line we can get the exact value of death rate for the major countries. As we can see there is an increasing trend in death rate as the virus spreading out. Chine is the only country that seemed to record a steady state from 12th of March to the 12th of April. Suprisingly the USA has a realtively low death rate if a man consider the high number of incidents that have been recordered the last couple of months. Another, interesting observation is the curve in the first half of March, in case of France, and how it goes up again in period of only 1.5 month. maybe we can include the deaths compare to the population

# collapse-hide
# data processing
selected_data['DeathRate'] = selected_data['Deaths'] / selected_data['Confirmed'] * 100
selected_data = selected_data.fillna(value=0)

# plot
highlight = alt.selection(type='single', on='mouseover',
                          fields=['Country'], nearest=True)

base = alt.Chart(selected_data, title='Death Rate Among Major Countries').transform_filter(
    alt.datum.Country != 'Iran'
).encode(
    x='Date:T',
    y=alt.Y('DeathRate:Q', title= 'Death Rate %'),
    color='Country:N',
    tooltip = [alt.Tooltip('DeathRate'),
               alt.Tooltip('Country'),
              ],
)

points = base.mark_circle().encode(
    opacity=alt.value(0)
).add_selection(
    highlight
).properties(
    width=600
)

lines = base.mark_line().encode(
    size=alt.condition(~highlight, alt.value(1), alt.value(3))
)

points + lines

#collapse-hide

population = {'Denmark':5792202,
             'China':1408526202,
             'France':65273511,
             'Italy':60461826,
             'Spain':46754775,
             'US':331002651,
             'United Kingdom':67886011}


for i in selected_data['Country']:
    for key,value in population.items():
        if i == key:
            selected_data['InfectionRate'] = selected_data['Confirmed']/value * 100

# plot
highlight = alt.selection(type='single', on='mouseover',
                          fields=['Country'], nearest=True)

base = alt.Chart(selected_data, title='Infection Rate Among Major Countries').transform_filter(
    alt.datum.Country != 'Iran'
).encode(
    x='Date:T',
    y=alt.Y('InfectionRate:Q', title= 'Infection Rate %'),
    color='Country:N',
    tooltip = [alt.Tooltip('InfectionRate'),
               alt.Tooltip('Country'),
              ],
)

points = base.mark_circle().encode(
    opacity=alt.value(0)
).add_selection(
    highlight
).properties(
    width=700
)

lines = base.mark_line().encode(
    size=alt.condition(~highlight, alt.value(1), alt.value(3))
)

points + lines

Overview of COVID-19 current distribution worldwide

Now, we would like to illustrate how Covid-19 has been distributed among the analysed countries. In the first graph plot is illustrated the relation between confirmed and death cases from the day the first diagnosed case and up to now. By scrolling the slide bar under the plot it can be oserved the increase on deaths per day. It is very interesting how many more deaths compare to other countries have been recorded in the USA in only 60 days (by the time the report was written).

# collapse-hide
# data processing
start_date = datetime.strptime('2020-01-22', '%Y-%m-%d')
for index, row in selected_data.iterrows():
    date = datetime.strptime(row['Date'], '%Y-%m-%d')
    selected_data.loc[index, 'Day'] = (date - start_date).days
    
selected_data['Day'] = selected_data['Day'].astype(int)
# plot
select_date = alt.selection_single(
    name='select', fields=['Day'], init={'Day': 0},
    bind=alt.binding_range(min=0, max=selected_data.Day.max(), step=1)
)
alt.Chart(selected_data, title='COVID-19 Spread Over Time').transform_filter(
    alt.datum.Country != 'Iran').mark_point(filled=True).encode(
    alt.X('Confirmed', scale=alt.Scale(zero=False)),
    alt.Y('Deaths', scale=alt.Scale(zero=False)),
    alt.Size('Active'),
    alt.Color('Country'),
    alt.Order('Confirmed', sort='descending'),
    tooltip = [alt.Tooltip('Confirmed'),
               alt.Tooltip('Deaths'),
               alt.Tooltip('Active')
              ],
).properties(
    width=750,
    height=400
).add_selection(select_date).transform_filter(select_date)

Below it is illustrated how the COVID-19 has been spreaded out among major countries and how they compared to Denmark. In China where the COVID-19 first appeared, shows a high increase in number of cases per day during February and in relatively short period of time archives to diminish those numbers due to strict measurements. The rest of the countries (apart from Denmark) that didn't apply strict measurements on time we observe a high increase in new cases and no significant drop since those numbers reached their peak. In case of the USA and UK these numbers seems to keep inceasing.

# collapse-hide
# plot
interval = alt.selection_interval()

circle = alt.Chart(selected_data, title='Spread and New Cases Over Time').transform_filter(
    alt.datum.Country != 'Iran').mark_circle().encode(
    x='monthdate(Date):O',
    y='Country',
    color=alt.condition(interval, 'Country', alt.value('lightgray')),
    size=alt.Size('New cases:Q',
        scale=alt.Scale(range=[0, 3000]),
        legend=alt.Legend(title='Daily new cases')
    ) 
).properties(
    width=1000,
    height=400,
    selection=interval
)

bars = alt.Chart(selected_data).mark_bar().encode(
    y='Country',
    color='Country',
    x='sum(New cases):Q'
).properties(
    width=1000
).transform_filter(
    interval
)

circle & bars

# collapse-hide
# data preperation, combine refrence dataset to virus dataset to obtain area code for map plot
refrence = refrence.rename(columns={'Country_Region': 'Country/Region'})
most_recent_data = world_data[world_data['Date'] == world_data['Date'].max()]
most_recent_data = most_recent_data[['Date', 'Country/Region', 'Confirmed','Recovered','Deaths']]
grouped = most_recent_data.groupby('Country/Region').sum()

result = grouped.join(refrence.set_index('Combined_Key'), on='Country/Region')
result = result.fillna(value=0)
result['code3'] = result['code3'].astype(int)

# plot
alt.Chart(alt.topo_feature(data.world_110m.url, 'countries'), title='Confirmed Cases Map').mark_geoshape(
    stroke='#aaa', strokeWidth=0.25
).transform_lookup(
    lookup='id', from_=alt.LookupData(data=result, key='code3', fields=['Confirmed'])
).encode(
    alt.Color('Confirmed:Q',
              scale=alt.Scale(domain=[0, result.Confirmed.max()/10], clamp=True), 
              legend=alt.Legend(format='')),
    alt.Tooltip('Confirmed:Q')
).project(
    type='equirectangular'
).properties(
    width=900,
    height=500
).configure_view(
    stroke=None
)

Macroeconomic

should we show only for Denmark or globally

In this section we will attempt to perform an economic analysis from a macroeconimic point of view and in relation to the COVID-19 analysis above, we will try to come up with the potential coclusions on how the spread of the virus has affected the global economy. A closer look to Denmark will be given in this section as well. take a look on that again.

Macroeconomics is a branch of economics that studies how an overall economy behaves (focuses on the large scale). More presicely, macroeconomics studies economy-wide phenomena such as inflation, price levels, rate of economic growth, national income, gross domestic product (GDP), and changes in unemployment (Investopedia).

Stock Market

for denmark update all shares and omx20, look again USA i dont know why the shares don't appear

Talk about the stock market

# collapse-hide
#import sectors data
chemicalsdk = pd.read_csv(path+'Copenhagen Chemicals Historical Data.csv')
consumersdk = pd.read_csv(path+'Copenhagen Consumer Goods Historical Data.csv')
servicesdk = pd.read_csv(path+'Copenhagen Consumer Services Historical Data.csv')
financialsdk = pd.read_csv(path+'Copenhagen Financials Historical Data.csv')
healthdk = pd.read_csv(path+'Copenhagen Health Care Historical Data.csv')
industrialsdk = pd.read_csv(path+'Copenhagen Industrials Historical Data.csv')
ogdk = pd.read_csv(path+'Copenhagen Oil & Gas Historical Data.csv')
realdk = pd.read_csv(path+'Copenhagen Real Estate Historical Data.csv')
technologydk = pd.read_csv(path+'Copenhagen Technology Historical Data.csv')



# stock data preprocessing
stockOMX20['Symbol'] = 'OMX 20'
stockCopenhagenAllShare['Symbol'] = 'Copenhagen All Shares'
#stockOMX25['Symbol'] = 'OMX 25'
chemicalsdk['Symbol'] = 'Chemicals'
consumersdk['Symbol'] = 'Consumer Goods'
servicesdk['Symbol'] = 'Consumer Services'
financialsdk['Symbol'] = 'Financials'
healthdk['Symbol'] = 'Health Care'
industrialsdk['Symbol'] = 'Industrials'
ogdk['Symbol'] = 'Oil & Gas'
realdk['Symbol'] = 'Real Estate'
technologydk['Symbol'] = 'Technology'



stockAll = pd.concat([stockOMX20, stockCopenhagenAllShare,chemicalsdk,consumersdk,servicesdk,financialsdk,
                     healthdk,industrialsdk,ogdk,realdk,technologydk])
stockAll['Date'] = pd.to_datetime(stockAll.Date)
stockAll = stockAll.sort_values(by=['Symbol', 'Date'])
stockAll['Price'] = stockAll['Price'].str.replace(',', '')
stockAll['Price'] = stockAll['Price'].astype(float)

#collapse-hide
line = alt.Chart(stockAll).mark_line(interpolate='basis').encode(
    x='Date',
    y='Price',
    color='Symbol:N',
)

nearest = alt.selection(type='single', nearest=True, on='mouseover',
                        fields=['Date'], empty='none')

selectors = alt.Chart(stockAll, title='Major Index and Primary Sectors Stocks Price (Denmark) ').mark_point().encode(
    x='Date',
    opacity=alt.value(0)
).add_selection(
    nearest
)

# Draw points on the line, and highlight based on selection
points = line.mark_point().encode(
    opacity=alt.condition(nearest, alt.value(1), alt.value(0))
)

# Draw text labels near the points, and highlight based on selection
text = line.mark_text(align='left', dx=5, dy=-5).encode(
    text=alt.condition(nearest, 'Price', alt.value(' '))
)

# Draw a rule at the location of the selection
rules = alt.Chart(stockAll).mark_rule(color='gray').encode(
    x='Date',
).transform_filter(
    nearest
)

# Put the five layers into a chart and bind the data
alt.layer(
    line, selectors, points, rules, text
).properties(
    width=600, height=300
)

#collapse-hide
# stock data preprocessing for France
# Top 40 companies in France
stockCAC40 = pd.read_csv(path+'CAC40.csv')

# importing sectors
CACbasic = pd.read_csv(path+'CACBasicMaterials.csv')
CACconsumer=pd.read_csv(path+'CACConsumerGoods.csv')
CACservice =pd.read_csv(path+'CACConsumerService.csv')
CACfinancial =pd.read_csv(path+'CACFinancials.csv')
CACutilities =pd.read_csv(path+'CACUtilities.csv')
CACtech =pd.read_csv(path+'CACTechnology.csv')
CAChealth =pd.read_csv(path+'CACHealthCare.csv')
CACoil =pd.read_csv(path+'CACOil&Gas.csv')
CACindustrial =pd.read_csv(path+'CACIndustrials.csv')
cacall = pd.read_csv(path+'CAC All Shares.csv')
#prepare the data for plotting
stockCAC40['Symbol']='CAC 40'
CACbasic['Symbol'] = 'CAC Basic Materials'
CACconsumer['Symbol'] = 'CAC Consumer Goods'
CACservice['Symbol'] = 'CAC Consumer Services'
CACfinancial['Symbol'] = 'CAC Financials'
CACutilities['Symbol'] = 'CAC Industrials'
CACtech['Symbol'] = 'CAC Technology'
CAChealth['Symbol'] = 'CAC Health Care'
CACoil['Symbol'] = 'CAC Oil & Gas'
CACindustrial['Symbol'] = 'CAC Industrials'
cacall['Symbol'] = 'France All Shares'
stockFRA = pd.concat([stockCAC40,CACbasic,CACconsumer,CACservice,CACfinancial,CACutilities,CACtech,
                     CAChealth,CACoil,CACindustrial,cacall],sort = True)
stockFRA['Date'] = pd.to_datetime(stockFRA.Date)
stockFRA = stockFRA.sort_values(by=['Symbol','Date'])
stockFRA['Price'] = stockFRA['Price'].str.replace(',','')
stockFRA['Price'] = stockFRA['Price'].astype(float)

# collapse-hide
line = alt.Chart(stockFRA).mark_line(interpolate='basis').encode(
    x='Date',
    y='Price',
    color='Symbol',
)

nearest = alt.selection(type='single', nearest=True, on='mouseover',
                        fields=['Date'], empty='none')

selectors = alt.Chart(stockFRA, title='Major Index & Primary Sectors Stocks Price(France)').mark_point().encode(
    x='Date',
    opacity=alt.value(0)
).add_selection(
    nearest
)

# Draw points on the line, and highlight based on selection
points = line.mark_point().encode(
    opacity=alt.condition(nearest, alt.value(1), alt.value(0))
)

# Draw text labels near the points, and highlight based on selection
text = line.mark_text(align='left', dx=5, dy=-5).encode(
    text=alt.condition(nearest, 'Price', alt.value(' '))
)

# Draw a rule at the location of the selection
rules = alt.Chart(stockFRA).mark_rule(color='gray').encode(
    x='Date',
).transform_filter(
    nearest
)

# Put the five layers into a chart and bind the data
alt.layer(
    line, selectors, points, rules, text
).properties(
    width=600, height=300
)

#collapse-hide

#importing stocks for Italy
stockMIB = pd.read_csv(path+'FTSE MIB.csv')
utilities = pd.read_csv(path+'FTSE Italia Utilities.csv')
Telecommunications = pd.read_csv(path+'FTSE Italia Telecommunications.csv')
Technology = pd.read_csv(path+'FTSE Italia Technology.csv')
O_G = pd.read_csv(path+'FTSE Italia Oil & Gas.csv')
Travel = pd.read_csv(path+'FTSE Italia All Share Travel & Leisure.csv')
industrials = pd.read_csv(path+'FTSE Italia All Share Industrials.csv')
financials = pd.read_csv(path+'FTSE Italia All Share Financials.csv')
health = pd.read_csv(path+'FTSE Italia All Share Health Care.csv')
chemicals = pd.read_csv(path+'FTSE Italia All Share Chemicals.csv')
allsharesitalia = pd.read_csv(path+'FTSE Italia All Share.csv')
#prepare data for plotting
stockMIB['Symbol']='MIB'
utilities['Symbol'] = 'FTSE Utilities'
Telecommunications['Symbol'] = 'FTSE Telecommunications'
Technology['Symbol'] = 'FTSE Technology'
O_G['Symbol'] = 'FTSE Oil & Gas'
Travel['Symbol'] = 'FTSE Travel & Leisure'
industrials['Symbol'] = 'FTSE Industrials'
financials['Symbol'] = 'FTSE Financials'
health['Symbol'] = 'FTSE Health Care'
chemicals['Symbol'] = 'FTSE Chemicals'
allsharesitalia['Symbol'] = 'Italy All Shares'
stockITA = pd.concat([stockMIB,utilities,Telecommunications,Technology,O_G,Travel,
                     industrials,financials,health,chemicals,allsharesitalia],sort = True)
stockITA['Date'] = pd.to_datetime(stockITA.Date)
stockITA = stockITA.sort_values(by=['Symbol','Date'])
stockITA['Price'] = stockITA['Price'].str.replace(',','')
stockITA['Price'] = stockITA['Price'].astype(float)

# collapse-hide
line = alt.Chart(stockITA).mark_line(interpolate='basis').encode(
    x='Date',
    y='Price',
    color='Symbol',
)

nearest = alt.selection(type='single', nearest=True, on='mouseover',
                        fields=['Date'], empty='none')

selectors = alt.Chart(stockITA, title='Major Index & Primary Sectors Stocks Price(Italy)').mark_point().encode(
    x='Date',
    opacity=alt.value(0)
).add_selection(
    nearest
)

# Draw points on the line, and highlight based on selection
points = line.mark_point().encode(
    opacity=alt.condition(nearest, alt.value(1), alt.value(0))
)

# Draw text labels near the points, and highlight based on selection
text = line.mark_text(align='left', dx=5, dy=-5).encode(
    text=alt.condition(nearest, 'Price', alt.value(' '))
)

# Draw a rule at the location of the selection
rules = alt.Chart(stockITA).mark_rule(color='gray').encode(
    x='Date',
).transform_filter(
    nearest
)

# Put the five layers into a chart and bind the data
alt.layer(
    line, selectors, points, rules, text
).properties(
    width=600, height=500
)

#collapse-hide

#importing stocks for Spain
ibex = pd.read_csv(path+'IBEX 35 Historical Data.csv')
materials= pd.read_csv(path+'Madrid Basic Materials Industry and Construction Historical Data.csv')
consumer = pd.read_csv(path+'Madrid Consumer Goods Historical Data.csv')
service = pd.read_csv(path+'Madrid Consumer Services Historical Data.csv')
financial = pd.read_csv(path+'Madrid Financial Services & Real Estate Historical Data.csv')
petrol = pd.read_csv(path+'Madrid Petrol and Power Historical Data.csv')
technology = pd.read_csv(path+'Madrid Technology and Telecommunications Historical Data.csv')
spainall = pd.read_csv(path+'IBEX MAB All Share Historical Data.csv')

#prepare data for plotting
ibex['Symbol']='IBEX 35'
materials['Symbol'] = 'Basic Materials Industry and Construction'
consumer['Symbol'] = 'Consumer Goods'
service['Symbol'] = 'Services'
financial['Symbol'] = 'Financial Services & Real Estate'
petrol['Symbol'] = 'Petrol and Power'
technology['Symbol'] = 'Technology and Telecommunications'
spainall['Symbol'] = 'Spain All Shares'
health['Symbol'] = 'FTSE Health Care'
chemicals['Symbol'] = 'FTSE Chemicals'
allsharesitalia['Symbol'] = 'Italy All Shares'
stockSP = pd.concat([ibex,materials,consumer,service,financial,petrol,technology,spainall],sort = True)
stockSP['Date'] = pd.to_datetime(stockSP.Date)
stockSP = stockSP.sort_values(by=['Symbol','Date'])
stockSP['Price'] = stockSP['Price'].str.replace(',','')
stockSP['Price'] = stockSP['Price'].astype(float)

# collapse-hide
line = alt.Chart(stockSP).mark_line(interpolate='basis').encode(
    x='Date',
    y='Price',
    color='Symbol',
)

nearest = alt.selection(type='single', nearest=True, on='mouseover',
                        fields=['Date'], empty='none')

selectors = alt.Chart(stockSP, title='Major Index & Primary Sectors Stocks Price(Spain)').mark_point().encode(
    x='Date',
    opacity=alt.value(0)
).add_selection(
    nearest
)

# Draw points on the line, and highlight based on selection
points = line.mark_point().encode(
    opacity=alt.condition(nearest, alt.value(1), alt.value(0))
)

# Draw text labels near the points, and highlight based on selection
text = line.mark_text(align='left', dx=5, dy=-5).encode(
    text=alt.condition(nearest, 'Price', alt.value(' '))
)

# Draw a rule at the location of the selection
rules = alt.Chart(stockSP).mark_rule(color='gray').encode(
    x='Date',
).transform_filter(
    nearest
)

# Put the five layers into a chart and bind the data
alt.layer(
    line, selectors, points, rules, text
).properties(
    width=600, height=300
)

#collapse-hide

#importing stocks for the UK
ftse100 = pd.read_csv(path+'FTSE 100 Historical Data.csv')
auto= pd.read_csv(path+'FTSE 350 - Automobiles & Parts Historical Data.csv')
forestry = pd.read_csv(path+'FTSE 350 - Forestry & Paper Historical Data.csv')
metals = pd.read_csv(path+'FTSE 350 - Industrial Metals & Mining Historical Data.csv')
telecom = pd.read_csv(path+'FTSE 350 - Mobile Telecommunications Historical Data.csv')
realestate = pd.read_csv(path+'FTSE 350 - Real Estate Historical Data.csv')
aerospace = pd.read_csv(path+'FTSE 350 Aerospace & Defense Historical Data.csv')
beverage = pd.read_csv(path+'FTSE 350 Beverages Historical Data.csv')
chemicalsuk = pd.read_csv(path+'FTSE 350 Chemicals Historical Data.csv')
construction = pd.read_csv(path+'FTSE 350 Construction & Building Materials Historical Data.csv')
ukall = pd.read_csv(path+'FTSE All-Share Historical Data.csv')

#prepare data for plotting
ftse100['Symbol']='FTSE 100'
auto['Symbol'] = 'Automobiles & Parts'
forestry['Symbol'] = 'Forestry & Paper'
metals['Symbol'] = 'Industrial Metals & Mining'
telecom['Symbol'] = 'Mobile Telecommunications'
realestate['Symbol'] = 'Real Estate'
aerospace['Symbol'] = 'Aerospace & Defense'
beverage['Symbol'] = 'Beverages'
ukall['Symbol'] = 'United Kingdom All Shares'
chemicalsuk['Symbol'] = 'Chemicals'
construction['Symbol'] = 'Construction & Building Materials'

stockUK = pd.concat([ftse100,auto,forestry,metals,telecom,realestate,aerospace,beverage,chemicalsuk,construction,ukall],sort = True)
stockUK['Date'] = pd.to_datetime(stockUK.Date)
stockUK = stockUK.sort_values(by=['Symbol','Date'])
stockUK['Price'] = stockUK['Price'].str.replace(',','')
stockUK['Price'] = stockUK['Price'].astype(float)

# collapse-hide
line = alt.Chart(stockUK).mark_line(interpolate='basis').encode(
    x='Date',
    y='Price',
    color='Symbol',
)

nearest = alt.selection(type='single', nearest=True, on='mouseover',
                        fields=['Date'], empty='none')

selectors = alt.Chart(stockUK, title='Major Index & Primary Sectors Stocks Price (UK)').mark_point().encode(
    x='Date',
    opacity=alt.value(0)
).add_selection(
    nearest
)

# Draw points on the line, and highlight based on selection
points = line.mark_point().encode(
    opacity=alt.condition(nearest, alt.value(1), alt.value(0))
)

# Draw text labels near the points, and highlight based on selection
text = line.mark_text(align='left', dx=5, dy=-5).encode(
    text=alt.condition(nearest, 'Price', alt.value(' '))
)

# Draw a rule at the location of the selection
rules = alt.Chart(stockUK).mark_rule(color='gray').encode(
    x='Date',
).transform_filter(
    nearest
)

# Put the five layers into a chart and bind the data
alt.layer(
    line, selectors, points, rules, text
).properties(
    width=600, height=500
)

#collapse-hide

#importing stocks for the USA
dow30 = pd.read_csv(path+'Dow Jones Industrial Average Historical Data.csv')
consumerus= pd.read_csv(path+'Dow Jones Consumer Goods Historical Data.csv')
servicesus = pd.read_csv(path+'Dow Jones Consumer Services Historical Data.csv')
financialsus = pd.read_csv(path+'Dow Jones Financials Historical Data.csv')
healthus = pd.read_csv(path+'Dow Jones Health Care Historical Data.csv')
industrialsus = pd.read_csv(path+'Dow Jones Industrials Historical Data.csv')
ogus = pd.read_csv(path+'Dow Jones Oil & Gas Historical Data.csv')
materialsus = pd.read_csv(path+'Dow Jones Basic Materials Historical Data.csv')
technologyus = pd.read_csv(path+'Dow Jones Technology Historical Data.csv')
telecomus = pd.read_csv(path+'Dow Jones Telecommunications Historical Data.csv')
utilitiesus = pd.read_csv(path+'Dow Jones Utilities Historical Data.csv')

#prepare data for plotting
dow30['Symbol']='Dow 30'
consumerus['Symbol'] = 'Consumer Goods'
servicesus['Symbol'] = 'Consumer Services'
financialsus['Symbol'] = 'Financials'
healthus['Symbol'] = 'Health Care'
industrialsus['Symbol'] = 'Industrials'
ogus['Symbol'] = 'Oil & Gas'
materialsus['Symbol'] = 'Materials'
technologyus['Symbol'] = 'Technology'
telecomus['Symbol'] = 'Telecommunications'
utilitiesus['Symbol'] = 'Utilities'

stockUS = pd.concat([dow30,consumerus,servicesus,financialsus,healthus,industrialsus,ogus,materialsus,technologyus,
                    telecomus,utilitiesus],sort = True)
stockUS['Date'] = pd.to_datetime(stockUS.Date)
stockUS = stockUS.sort_values(by=['Symbol','Date'])
stockUS['Price'] = stockUS['Price'].str.replace(',','')
stockUS['Price'] = stockUS['Price'].astype(float)

# collapse-hide
line = alt.Chart(stockUS).mark_line(interpolate='basis').encode(
    x='Date',
    y='Price',
    color='Symbol',
)

nearest = alt.selection(type='single', nearest=True, on='mouseover',
                        fields=['Date'], empty='none')

selectors = alt.Chart(stockUS, title='Major Index & Primary Sectors Stocks Price (USA)').mark_point().encode(
    x='Date',
    opacity=alt.value(0)
).add_selection(
    nearest
)

# Draw points on the line, and highlight based on selection
points = line.mark_point().encode(
    opacity=alt.condition(nearest, alt.value(1), alt.value(0))
)

# Draw text labels near the points, and highlight based on selection
text = line.mark_text(align='left', dx=5, dy=-5).encode(
    text=alt.condition(nearest, 'Price', alt.value(' '))
)

# Draw a rule at the location of the selection
rules = alt.Chart(stockUS).mark_rule(color='gray').encode(
    x='Date',
).transform_filter(
    nearest
)

# Put the five layers into a chart and bind the data
alt.layer(
    line, selectors, points, rules, text
).properties(
    width=600, height=300)

#collapse-hide

#importing stocks for the USA
shanghai = pd.read_csv(path+'Shanghai Composite Historical Data.csv')
szse= pd.read_csv(path+'SZSE Component Historical Data.csv')
oilch = pd.read_csv(path+'FTSE China - Oil Equipment Services & Distribution Historical Data.csv')
banksch = pd.read_csv(path+'FTSE China A 600 - Banks Historical Data.csv')
electricitych = pd.read_csv(path+'FTSE China A 600 - Electricity Historical Data.csv')
financialsch = pd.read_csv(path+'FTSE China A 600 - Financials Historical Data.csv')
gwch = pd.read_csv(path+'FTSE China A 600 - Gas & Water Multiutilities Historical Data.csv')
retailersch = pd.read_csv(path+'FTSE China A 600 - General Retailers Historical Data.csv')
lifeinsurancech = pd.read_csv(path+'FTSE China A 600 - Life Insurance Historical Data.csv')
mediach = pd.read_csv(path+'FTSE China A 600 - Media Historical Data.csv')
realestatech = pd.read_csv(path+'FTSE China A 600 - Real Estate Investment & Services Historical Data.csv')
scch = pd.read_csv(path+'FTSE China A 600 - Software & Computer Services Historical Data.csv')


#prepare data for plotting
shanghai['Symbol']='Shanghai Composite'
szse['Symbol'] = 'SZSE Component'
oilch['Symbol'] = 'Oil Equipment Services & Distribution'
banksch['Symbol'] = 'Banks'
electricitych['Symbol'] = 'Electricity'
financialsch['Symbol'] = 'Financials'
gwch['Symbol'] = 'Gas & Water'
retailersch['Symbol'] = 'General Retailers'
lifeinsurancech['Symbol'] = 'Life Insurance'
mediach['Symbol'] = 'Media'
realestatech['Symbol'] = 'Real Estate Investment & Services'
scch['Symbol'] = 'Software & Computer Services'


stockCH = pd.concat([shanghai,szse,oilch,banksch,electricitych,financialsch,gwch,retailersch,lifeinsurancech,
                    mediach,realestatech,scch],sort = True)
stockCH['Date'] = pd.to_datetime(stockCH.Date)
stockCH = stockCH.sort_values(by=['Symbol','Date'])
stockCH['Price'] = stockCH['Price'].str.replace(',','')
stockCH['Price'] = stockCH['Price'].astype(float)

# collapse-hide
line = alt.Chart(stockCH).mark_line(interpolate='basis').encode(
    x='Date',
    y='Price',
    color='Symbol',
)

nearest = alt.selection(type='single', nearest=True, on='mouseover',
                        fields=['Date'], empty='none')

selectors = alt.Chart(stockCH, title='Major Index & Primary Sectors Stocks Price (China)').mark_point().encode(
    x='Date',
    opacity=alt.value(0)
).add_selection(
    nearest
)

# Draw points on the line, and highlight based on selection
points = line.mark_point().encode(
    opacity=alt.condition(nearest, alt.value(1), alt.value(0))
)

# Draw text labels near the points, and highlight based on selection
text = line.mark_text(align='left', dx=5, dy=-5).encode(
    text=alt.condition(nearest, 'Price', alt.value(' '))
)

# Draw a rule at the location of the selection
rules = alt.Chart(stockCH).mark_rule(color='gray').encode(
    x='Date',
).transform_filter(
    nearest
)

# Put the five layers into a chart and bind the data
alt.layer(
    line, selectors, points, rules, text
).properties(
    width=600, height=500)

GDP Inflation & unemployment data

Major countrys' GDP Inflation and unemployment annual change rate data from IMF includes forecast of 2020 and 2021

I have removed Germany because we are not icluding it in the analysis above and now we need data if possible for the UK

# collapse-hide
# data preprocessing
def extract_data(df, subject):
    dates = ['2014', '2015', '2016', '2017', '2018', '2019', '2020', '2021']
    d = {'Date': dates, 'Value': [df[date] for date in dates]}
    values = []
    countries = []
    _dates = []
    for country in df.Country.unique():
        tmp = df.loc[df.Country == country]
        for date in dates:
            countries.append(country)
            _dates.append(date)
            values.append(float(tmp[date]))
    
    rv = pd.DataFrame.from_dict({'Date': _dates, 'Country': countries, 'Value': values})
    rv['subject'] = subject
    return rv

unemploy = majorCountry.loc[majorCountry['Subject Descriptor'] == 'Unemployment rate']
unemploy = extract_data(unemploy[unemploy.Country != 'Germany'], 'unemployment')
inflation = majorCountry.loc[majorCountry['Subject Descriptor'] == 'Inflation, average consumer prices']
inflation = extract_data(inflation[inflation.Country != 'Germany'], 'inflation')
gdp = majorCountry.loc[majorCountry['Subject Descriptor'] == 'Gross domestic product, constant prices']
gdp = extract_data(gdp[gdp.Country != 'Germany'], 'gdp')

# A dropdown filter
countries = list(majorCountry.Country.unique())
country_dropdown = alt.binding_select(options=countries)
country_select = alt.selection_single(fields=['Country'], bind=country_dropdown, name="Select")

filter_gdp = alt.Chart(gdp, width=300, height=300, title='GDP Growth of Major Countries').mark_line(point=True).encode(
    alt.X('Date:T'),
    alt.Y('Value:Q', title= 'Growth Rate %'),
    color='Country',
    tooltip = [alt.Tooltip('Value:Q')]
).add_selection(country_select).transform_filter(country_select)

# umemployment plot
filter_unemployment = alt.Chart(unemploy, width=300, height=300, title='Unemployment Change of Major Countries').mark_line(point=True).encode(
    alt.X('Date:T'),
    alt.Y('Value:Q', title= 'Growth Rate %'),
    color='Country',
    tooltip = [alt.Tooltip('Value:Q')]
).add_selection(country_select).transform_filter(country_select)

# inflation plot
filter_inflation = alt.Chart(inflation, width=300, height=300, title='Inflation Change of Major Countries').mark_line(point=True).encode(
    alt.X('Date:T'),
    alt.Y('Value:Q', title= 'Growth Rate %'),
    color='Country',
    tooltip = [alt.Tooltip('Value:Q')]
).add_selection(country_select).transform_filter(country_select)


(filter_gdp | filter_unemployment | filter_inflation)

References

  1. Investopedia